Statistical decision making for optimal budget allocation in crowd labeling

نویسندگان

  • Xi Chen
  • Qihang Lin
  • Dengyong Zhou
چکیده

It has become increasingly popular to obtain machine learning labels through commercial crowdsourcing services. The crowdsourcing workers or annotators are paid for each label they provide, but the task requester usually has only a limited amount of the budget. Since the data instances have different levels of labeling difficulty and the workers have different reliability for the labeling task, it is desirable to wisely allocate the budget among all the instances and workers such that the overall labeling quality is maximized. In this paper, we formulate the budget allocation problem as a Bayesian Markov decision process (MDP), which simultaneously conducts learning and decision making. The optimal allocation policy can be obtained by using the dynamic programming (DP) recurrence. However, DP quickly becomes computationally intractable when the size of the problem increases. To solve this challenge, we propose a computationally efficient approximate policy which is called optimistic knowledge gradient. Our method applies to both pull crowdsourcing marketplaces with homogeneous workers and push marketplaces with heterogeneous workers. It can also incorporate the contextual information of instances when they are available. The experiments on both simulated and real data show that our policy achieves a higher labeling quality than other existing policies at the same budget level.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Decision Making for Budget Allocation in Crowdsourcing

In this short paper, we briefly describe some recent progress on statistical decision making for budget allocation in crowdsourcing. We address the budget allocation problem for two important labeling tasks in crowdsourcing: the categorization labeling task and pairwise ranking aggregation. We also show the connections between our work and the “proactive learning” framework proposed by Jaime Ca...

متن کامل

Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing

We consider the budget allocation problem in binary/multi-class crowd labeling where each label from the crowd has a certain cost. Since different instances have different ambiguities and different workers have different reliabilities, a fundamental challenge is how to allocate a pre-fixed amount of budget among instance-worker pairs so that the overall accuracy can be maximized. We start with ...

متن کامل

Large-Scale Markov Decision Problems with KL Control Cost and its Application to Crowdsourcing

We study average and total cost Markov decision problems with large state spaces. Since the computational and statistical cost of finding the optimal policy scales with the size of the state space, we focus on searching for near-optimality in a low-dimensional family of policies. In particular, we show that for problems with a KullbackLeibler divergence cost function, we can recast policy optim...

متن کامل

Bayes-Optimal Effort Allocation in Crowdsourcing: Bounds and Index Policies

We consider effort allocation in crowdsourcing, where we wish to assign labeling tasks to imperfect homogeneous crowd workers to maximize overall accuracy in a continuous-time Bayesian setting, subject to budget and time constraints. The Bayes-optimal policy for this problem is the solution to a partially observable Markov decision process, but the curse of dimensionality renders the computatio...

متن کامل

Optimization of Urban Budget Allocation Based on Spatial Justice Indicators (Case: Mashhad Metropolis)

Abstract: One of the main responsibilities of urban managers is to create justice in the area of fair and equal access of citizens to urban services. By objective realization of spatial justice concept, while providing the citizens with the appropriate services, the ground of reducing urban problems is prepared. Spatial justice is one of the main concepts of sustainable urban development. This ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2015